{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Kernel change point detection: a performance comparison\n",
    "\n",
    "<!-- {{ add_binder_block(page) }} -->\n",
    "\n",
    "## Introduction\n",
    "\n",
    "In `ruptures`, there are two ways to perform kernel change point detection:\n",
    "\n",
    "- by using the pure Python classes [Dynp](../user-guide/detection/dynp.md) (known number of change points) and [Pelt](../user-guide/detection/pelt.md) (unknown number of change points),\n",
    "\n",
    "- by using the faster class (implemented in C) [KernelCPD](../../user-guide/detection/kernelcpd)  which contains both the dynamic programming approach and the penalized approach (PELT).\n",
    "\n",
    "This example illustrates the performance of the fast C implementation compared to the pure Python one.\n",
    "\n",
    "The kernel change point detection setting is briefly described in the [user guide](../../user-guide/detection/kernelcpd).\n",
    "The interested reader can refer to [[Celisse2018](#Celisse2018), [Arlot2019](#Arlot2019)] for a more complete introduction.<br>\n",
    "\n",
    "The list of available kernels is available [here](../../user-guide/detection/kernelcpd#available-kernels), but in this example we only consider two:\n",
    "\n",
    "- the linear kernel, $k_{\\text{linear}}(x, y) = x^T y$ (Euclidean scalar product) and the induced norm is the Euclidean norm;\n",
    "- the Gaussian kernel (also known as radial basis function, rbf), $k_{\\text{Gaussian}}(x,y)=\\exp(-\\gamma \\|x-y\\|^2)$ where $\\|\\cdot\\|$ is the Euclidean norm and $\\gamma>0$ is a user-defined parameter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we make the necessary imports and generate a toy signal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time  # for execution time comparison\n",
    "\n",
    "import matplotlib.pyplot as plt  # for display purposes\n",
    "\n",
    "import ruptures as rpt  # our package\n",
    "from ruptures.metrics import hausdorff\n",
    "\n",
    "# generate signal\n",
    "n_samples, dim, sigma = 500, 3, 3\n",
    "n_bkps = 6  # number of breakpoints\n",
    "signal, bkps = rpt.pw_constant(n_samples, dim, n_bkps, noise_std=sigma)\n",
    "fig, ax_array = rpt.display(signal, bkps)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear kernel\n",
    "\n",
    "The linear kernel (see above) $k_{\\text{linear}}$ can detect changes in the mean of a signal.\n",
    "It also corresponds to the cost function [`CostL2`](../../user-guide/costs/costl2).\n",
    "\n",
    "### Dynamic programming\n",
    "\n",
    "When the number of changes to detect is known beforehand, we use dynamic programming."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "algo_python = rpt.Dynp(model=\"l2\", jump=1, min_size=2).fit(\n",
    "    signal\n",
    ")  # written in pure python\n",
    "algo_c = rpt.KernelCPD(kernel=\"linear\", min_size=2).fit(signal)  # written in C\n",
    "\n",
    "for label, algo in zip(\n",
    "    (\"Python implementation\", \"C implementation\"), (algo_python, algo_c)\n",
    "):\n",
    "    start_time = time.time()\n",
    "    result = algo.predict(n_bkps=n_bkps)\n",
    "    print(f\"{label}:\\t{time.time() - start_time:.3f} s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The speed-up is quite significant and depends on the signal size (number $T$ of samples and dimension $d$) and the number $K$ of change points to detect.\n",
    "The C implementation has a time complexity of the order $\\mathcal{O}(KdT^2)$ and space complexity of the order $\\mathcal{O}(T)$.\n",
    "As to the Python implementation, the complexities in time and space are of the order $\\mathcal{O}(KdT^3)$ and $\\mathcal{O}(T^2)$ respectively.\n",
    "\n",
    "We can also check that both methods return the same set of change points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bkps_python = algo_python.predict(n_bkps=n_bkps)\n",
    "bkps_c = algo_c.predict(n_bkps=n_bkps)\n",
    "print(f\"Python implementation:\\t{bkps_python}\")\n",
    "print(f\"C implementation:\\t{bkps_c}\")\n",
    "print(f\"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PELT\n",
    "\n",
    "When the number of changes to detect is unknown, we resort to PELT [[Killick2012]](#Killick2012) to solve the penalized detection problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "algo_python = rpt.Pelt(model=\"l2\", jump=1, min_size=2).fit(\n",
    "    signal\n",
    ")  # written in pure python\n",
    "algo_c = rpt.KernelCPD(kernel=\"linear\", min_size=2).fit(\n",
    "    signal\n",
    ")  # written in C, same class as before\n",
    "\n",
    "\n",
    "penalty_value = 100  # beta\n",
    "\n",
    "for label, algo in zip(\n",
    "    (\"Python implementation\", \"C implementation\"), (algo_python, algo_c)\n",
    "):\n",
    "    start_time = time.time()\n",
    "    result = algo.predict(pen=penalty_value)\n",
    "    print(f\"{label}:\\t{time.time() - start_time:.3f} s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, the speed-up is quite significant and depends on the signal size (number $T$ of samples and dimension $d$) and the penalty value $\\beta$.\n",
    "We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.\n",
    "\n",
    "We can also check that both methods return the same set of change points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bkps_python = algo_python.predict(pen=penalty_value)\n",
    "bkps_c = algo_c.predict(pen=penalty_value)\n",
    "print(f\"Python implementation:\\t{bkps_python}\")\n",
    "print(f\"C implementation:\\t{bkps_c}\")\n",
    "print(f\"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "!!! note\n",
    "    By default, `Dynp` and `Pelt` has `jump=5`.\n",
    "    In `KernelCPD`, `jump=1` and cannot be changed.\n",
    "    This is because, in the C implementation, changing the `jump` does not improve the running time significatively, while it does in the Python implementation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gaussian kernel\n",
    "\n",
    "The Gaussian kernel (see above) $k_{\\text{Gaussian}}$ can detect changes in the distribution of an i.i.d. process.\n",
    "This is a feature of several kernel functions (in particular *characteristics* kernels; see [[Gretton2012]](#Gretton2012) for more information).\n",
    "It also corresponds to the cost function [`CostRbf`](../../user-guide/costs/costrbf).\n",
    "\n",
    "### Dynamic programming\n",
    "\n",
    "When the number of changes to detect is known beforehand, we use dynamic programming."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\"gamma\": 1e-2}\n",
    "algo_python = rpt.Dynp(model=\"rbf\", params=params, jump=1, min_size=2).fit(\n",
    "    signal\n",
    ")  # written in pure python\n",
    "algo_c = rpt.KernelCPD(kernel=\"rbf\", params=params, min_size=2).fit(\n",
    "    signal\n",
    ")  # written in C\n",
    "\n",
    "for label, algo in zip(\n",
    "    (\"Python implementation\", \"C implementation\"), (algo_python, algo_c)\n",
    "):\n",
    "    start_time = time.time()\n",
    "    result = algo.predict(n_bkps=n_bkps)\n",
    "    print(f\"{label}:\\t{time.time() - start_time:.3f} s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, the speed-up is quite significant.\n",
    "The C implementation has a time complexity of the order $\\mathcal{O}(CKT^2)$ and space complexity of the order $\\mathcal{O}(T)$, where $C$ is the complexity of computing $k(y_s, y_t)$ once.\n",
    "As to the Python implementation, the complexities in time and space are of the order $\\mathcal{O}(CKT^4)$ and $\\mathcal{O}(T^2)$ respectively.\n",
    "\n",
    "We can also check that both methods return the same set of change points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bkps_python = algo_python.predict(n_bkps=n_bkps)\n",
    "bkps_c = algo_c.predict(n_bkps=n_bkps)\n",
    "print(f\"Python implementation:\\t{bkps_python}\")\n",
    "print(f\"C implementation:\\t{bkps_c}\")\n",
    "print(f\"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)\")"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "!!! note\n",
    "    If not provided by the user, the `gamma` parameter is chosen using the median heuristics, meaning that it is set to inverse of the median of all pairwise products $k(y_s, y_t)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### PELT\n",
    "\n",
    "When the number of changes to detect is unknown, we resort to PELT to solve the penalized detection problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "algo_python = rpt.Pelt(model=\"rbf\", jump=1, min_size=2).fit(\n",
    "    signal\n",
    ")  # written in pure python\n",
    "algo_c = rpt.KernelCPD(kernel=\"rbf\", min_size=2).fit(\n",
    "    signal\n",
    ")  # written in C, same class as before\n",
    "\n",
    "\n",
    "penalty_value = 1  # beta\n",
    "\n",
    "for label, algo in zip(\n",
    "    (\"Python implementation\", \"C implementation\"), (algo_python, algo_c)\n",
    "):\n",
    "    start_time = time.time()\n",
    "    result = algo.predict(pen=penalty_value)\n",
    "    print(f\"{label}:\\t{time.time() - start_time:.3f} s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, the speed-up is quite significant and depends on the signal size (number $T$ of samples and dimension $d$) and the penalty value $\\beta$.\n",
    "We remark that, for both Python and C implementations, PELT is more efficient then dynamic programming.\n",
    "\n",
    "We can also check that both methods return the same set of change points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bkps_python = algo_python.predict(pen=penalty_value)\n",
    "bkps_c = algo_c.predict(pen=penalty_value)\n",
    "print(f\"Python implementation:\\t{bkps_python}\")\n",
    "print(f\"C implementation:\\t{bkps_c}\")\n",
    "print(f\"(Hausdorff distance: {hausdorff(bkps_python, bkps_c):.0f} samples)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "<a id=\"Gretton2012\">[Gretton2012]</a>\n",
    "Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13, 723–773.\n",
    "\n",
    "<a id=\"Killick2012\">[Killick2012]</a>\n",
    "Killick, R., Fearnhead, P., & Eckley, I. (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.\n",
    "\n",
    "<a id=\"Celisse2018\">[Celisse2018]</a>\n",
    "Celisse, A., Marot, G., Pierre-Jean, M., & Rigaill, G. (2018). New efficient algorithms for multiple change-point detection with reproducing kernels. Computational Statistics and Data Analysis, 128, 200–220.\n",
    "\n",
    "<a id=\"Arlot2019\">[Arlot2019]</a>\n",
    "Arlot, S., Celisse, A., & Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Journal of Machine Learning Research, 20(162), 1–56.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
